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ABSTRACT 

Information about molecular interactions in DNA 
can be obtained from experimental melting tem- 
perature data by using mesoscopic statistical 
physics models. Here, we extend the technique to 
RNA and show that the new parameters correctly 
reproduce known properties such as the stronger 
hydrogen bonds of AU base pairs. We also were 
able to calculate a complete set of elastic constants 
for all 10 irreducible combinations of nearest neigh- 
bours (NNs). We believe that this is particularly 
useful as experimentally derived information about 
RNA elasticity is relatively scarce. The melting tem- 
perature prediction using the present model 
improves over those from traditional NN model, 
providing thus an alternative way to calculate 
these temperatures for RNA. Additionally, we calcu- 
lated the site-dependent base pair oscillation to 
explain why RNA shows larger oscillation ampli- 
tudes despite having stronger AU hydrogen bonds. 



INTRODUCTION 

The biological importance of the RNA molecule grows 
almost on a daily basis and with it the pressing need for 
computationally efficient models which would be capable 
of reproducing the intricate physical properties of this 
molecule. Unlike DNA, RNA has a complex secondary 
structure and easily forms loops and bulges. Yet even the 
canonical Watson-Crick base pairing for RNA has 
received little attention from mesoscopic models such as 
the Peyrard-Bishop (PB) models. 

Our understanding of the flexibility of RNA also lags 
behind that of DNA. Early studies indicated a large per- 
sistence length of RNA when compared with DNA (1), 
that is that RNA is more rigid than DNA. This is still 
largely the view which was supported by single molecule 
(2,3) and atomic force microscopy (AFM) measurements 



(4). On a base pair level, that is, on a microscopic scale, 
most of our knowledge comes from molecular dynamic 
simulations (5-8). Here, the picture of RNA versus 
DNA flexibility is less clear. For instance, Noy et al. (6) 
have found from their simulations that RNA is not always 
more rigid than DNA. These apparent discrepancies 
between macroscopic and microscopic flexibilities are 
not unexpected. Even for DNA, it has been difficult to 
reconcile microscopic flexibility with mesoscopic persist- 
ence length measurements (9,10). 

The PB model was proposed as a simplified way for the 
study of DNA denaturation in 1989 (11) and since then 
has found numerous applications. For instance, it was 
used recently to study multifractal denaturation process 
of DNA (12), its melting dynamics (13) and for promoter 
interaction (14,15). The model can also be used under a 
variety of theoretical frameworks such as path-integral 
formalism (16) or wavelet analysis (17), just to give a 
few examples. One attractive feature of the model is the 
ease by which one may modify the Hamiltonian to reflect 
certain properties of DNA such as the addition of solv- 
ation barriers (18,19), external forces (20) and other 
modified potentials (21). 

In essence, the PB model proposes a simplified Hamil- 
tonian which takes into account the hydrogen bond and 
the stacking interaction. These two interactions are 
modelled in such a way that they use only one variable, 
for which this model is commonly described as being a 
ID model although it is geometrically a 2D description. 
This single variable is what makes the Hamiltonian tract- 
able by established methods such as the transfer integral 
method (11). 

Although the PB model has seen much use for DNA, 
there has been, to our knowledge, little research focusing 
on its application to RNA. This, despite the overwhelming 
biological importance of RNA and its challenging physics. 
We understand that one of the main reasons for this situ- 
ation is the lack of realistic parameters which would 
capture the essential physics of the RNA molecule. 

Here, we use published melting temperatures for RNA 
sequences to obtain the model parameters for the PB 
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model. Unlike our previous work with DNA (22), obtain- 
ing the model parameters for RNA was much more diffi- 
cult due to the varying sources of experimental data. The 
experimental results used here were from Xia et al. (23) 
which include data from several other sources, that is, a 
large number of these temperatures were obtained under 
varying experimental conditions. Therefore, we had to 
take extra steps to ensure that the resulting parameters 
were of acceptable quality. 

One advantage of using the PB model for calculating its 
parameters is that we can understand qualitatively these 
parameters in terms of hydrogen bonds and flexibility 
(22,24). Additionally, we use the model to calculate 
average base pair openings. We show that despite of a 
stronger hydrogen bond of the RNA r(AU) base pairs, 
when compared with DNA d(AT), we obtain larger base 
pair openings which are entirely consistent with previous 
findings. 



MATERIALS AND METHODS 

Sequences used 

Experimental measurements are from Xia et al. (23), they 
report an experimental uncertainty of 1.3°C for 109 RNA 
duplexes at 1.0 M NaCl ranging from 4 to 14 bp in length 
(see Supplementary Tables SI SIII for sequence informa- 
tion). Average prediction from nearest-neighbour (NN) 
model is 1.6°C for two-state sequences. For the non-two 
state sequences, an average prediction of 2.4° C is obtained 
[see Table 1 of (23)]. In our optimizations, we use all 
sequences, including those reported as non-two state 
[Table 1 of (23)]. One should note that Xia et al. extend 
the work by Freier et al. (25) and several other sources. 
Therefore, these melting temperatures were sourced from 
different groups and were performed under a variety of 
different experimental conditions. 

We also used some sequences reported in the molecular 
dynamics study of Pan and MacKerell (26) to analyse the 
base opening of RNA and DNA. 

The mesoscopic model 

We used the model proposed by Peyrard and Bishop (11) 
with harmonic stacking interaction. This simpler model is 
being favoured in this work over the more elaborate an- 
harmonic model proposed in (27) as we found that the 
anharmonicity does not improve the melting temperature 
prediction (22). 

The main components of this model are the hydrogen 
bond represented by a Morse potential, here shown for the 
example case of a AU base pair, 

F(AU) = D AV ((T yM/XAV - l) 2 (1) 
and the stacking interaction 

u<ApC) = 0 au - 2 >'auJ'cg cos 6 + y 2 CG ), (2) 

here exemplified for a ApC dimer, where k is the elastic 
constant and 6 is a small angle (0.01 rad) introduced to 



avoid numerical problems in the partition function inte- 
gral (28). 

For further details on the model implementation, please 
see (22,29). For the integration of Equation (14) of (29), 
we use 400 points over the interval [—0.1 nm, 20.0 nm] and 
a cut-off of P = 10 of Equation (22) of (29). The calcula- 
tion of the thermal index r is performed at 370 K, please 
note that this temperature is unrelated to the temperatures 
obtained from the regression method. 

Optimization method 

The model parameters were obtained by an optimization 
method described in (22). For each sequence i, we calcu- 
late an adimensional melting index r, which results from 
the PB model and its calculation is described in detail in 
(22). We use this index r,- to calculate new melting tem- 
peratures T'j and compare them with the experimental 
temperatures T t . The model parameters are then varied 
until we minimize the squared differences 

r = f>;.-r,-) 2 . (3) 

We also refer in this work to an average melting tem- 
perature deviation 

1 - 

Length-dependent regression 

As in our previous work on DNA (22), we use two 
equations, 

T' i (N) = a 0 (N) + a x {N)x i , (5) 

where we split the sequences into separate regression 
groups of length N, that is, for each group of length N, 
we obtain two regression coefficients ao(N) and a\(N). For 
these, we calculate another linear regression 

a k (ty = b 0 , k + b hk N 1/2 , k — OA (6) 

since we have found that the coefficients ao,i are essentially 
linear with N l/2 (30). 

Single equation regression 

When the model parameters are very far from their 
optimized values, the length-dependent regression given 
in Equation (6) is poorly fitted to a straight line. This 
causes the coefficients bo,k and bij ( to oscillate widely 
and prevents the parameter minimization from 
reaching any optimized values. To circumvent this 
problem, we used a single linear regression for the predic- 
tion without length dependence for the initial rounds of 
minimization 

r;=/o+/iT,. (7) 

The coefficients /o,i are linear regression coefficients 
calculated for all available sequences obtained from the 
experimental melting temperatures T t and r,-. 
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Error estimate by varying the melting temperatures 

To obtain an error estimate for our calculations, we repeat 
the minimization procedure over a modified set of melting 
temperatures. This is achieved by adding a small random 
amount &T t (positive or negative) to the reported melting 
temperature Tj. The random STj follows a Gaussian dis- 
tribution such that the resulting standard deviation of the 
modified set matches the reported experimental uncer- 
tainty. This procedure is repeated a large number of 
times, always with a new set of modified temperatures. 

Error estimate by varying initial parameters 
For the initial round of minimizations, we found that 
varying the melting temperatures as described in the 
previous paragraphs prevents the minimization from 
reaching optimized values. In this case, we opted to vary 
the initial parameters instead and keeping the melting tem- 
peratures unchanged. 

Minimization procedure 

Due to the large experimental uncertainty of the experi- 
mental temperatures as well as complete lack of know- 
ledge about approximate parameters for RNA, we 
performed the optimization in three separate steps. 

(i) Initial minimization. As we had no previous know- 
ledge about approximate model parameters for 
RNA, we had to initiate the minimization procedure 
with some extra precautions. Also, we decided not 
to use the parameters optimized for DNA to avoid 
biasing our results. Instead, we opted for using the 
approximate parameters from (29) in exactly the 
same way as we did for DNA (22). Therefore, we 
used the fixed values Aau = 3.3333 x 10~ 2 nm, Xcg 
= 1.25xl0 _2 nm and a uniform k = 2.5eV/nm 2 . 
For each round of minimization, the initial values 
for the Morse potential were changed randomly 
and uniformly within ±20% of the following 
values D AU = 30meV, Z) CG =80meV. Only the 
Morse potentials were allowed to vary during the 
minimization, the remaining parameters were kept 
as fixed values and we used the regression 
Equation (7). For the error estimate, we varied 
only the initial Morse potentials which were 
repeated 1000 times. 

(ii) First all-parameter minimization. We used the re- 
sulting parameters of the previous minimization as 
initial values for this minimization where now all 
parameters, except 6, are allowed to vary. As in 
minimization (i), we used the regression Equation 
(7) and repeated the minimization 1000 times with 
modified melting temperatures. 

(iii) Final all-parameter minimization. Once we reach an 
acceptable minimization of x 2 in step (ii), we per- 
formed the final round of minimization using as 
input the variables from the previous round, calcu- 
lation (ii). In this step, the more sensitive 
length-dependent regression model was used, 
Equations (5) and (6). We repeated this procedure 
200 times with randomly modified data sets to 
obtain an error estimate. 



RESULTS AND DISCUSSION 

Parameter optimization 

Considering the large experimental uncertainties of the 
melting temperatures (23), we took a few extra precau- 
tions for the assessment of the model parameters. First, 
we performed the optimization by varying only the Morse 
potentials Z>au and Dqg and keeping k constant for all 
NNs. These results are shown as calculation (i) in Table 1 
and already hint at a stronger r(AU) base pair when 
compared with d(AT), however with a (AT) twice as 
large as the uncertainty reported by Xia et al. (23). 

We use these new Morse potentials of calculation (i) as 
input for the second round of optimization which now 
considers all parameters, calculation (ii) in Table 1. This 
optimization step is performed with a single regression, 
Equation (7) described in 'Materials and Methods' 
section. The Morse potential of the r(AU) is increased 
when compared with calculation (i). Unsurprisingly, 
given the single linear regression of Equation (7) (see 
'Materials and Methods' section), the calculated param- 
eter uncertainty is very large. 

The third and final round of parameter optimization, 
calculation (iii), uses as starting point all the parameters 
obtained in calculation (ii), but now using the 
length-dependent regression Equations (5) and (6). Here, 
we obtain a much smaller uncertainty and an average de- 
viation of melting temperatures (A T) which is comparable 
to the one obtained from the NN model of 1.6°C. Unlike 
the NN model (23), however, we do not exclude from our 
optimization-regression the non-two-state sequences. 
The average temperature deviation if we consider only 
the non-two-state sequences of (23) is 2.0°C, which is 
lower than the 2.4°C reported by Xia et al. (23). The 
complete regression coefficients are listed in Supplemen- 
tary Tables SIV and SV. 

Hydrogen bonds and base pair oscillations 

From Table 1, final minimization (iii), we clearly see that 
the hydrogen bond of 39(3) meV for r(AU) base pairs is 
stronger than for corresponding d(AT) base pairs of 
33(2) meV. The AU hydrogen bond was the object of 
several studies and determining its strength, which was 
found to be stronger than AT in DNA, was of consider- 
able experimental and theoretical difficulty (31-33). Also, 
it is difficult to reconcile the stronger AU hydrogen bond 
with larger base pair oscillations reported in molecular 
dynamic simulations, for example the simulations by 
Pan and MacKerell (26) for RNA, DNA and uracil 
incorporated in DNA. Larger base pair opening were 
also observed experimentally in nuclear magnetic reson- 
ance (NMR) measurements by Snoussi and Leroy (34) 
and Varnai et al. (35). 

To verify the amplitudes of the base pair oscillations, we 
calculated the average base pair opening (v) following the 
work by Zhang et al. (28). Figures 1 and 2 show these 
amplitudes for RNA and DNA calculated at the same 
temperatures. The DNA sequences were calculated with 
the parameters of (22). Please note that these temperatures 
are unrelated to the melting temperatures calculated from 
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Table 1. Morse parameters D and k 

Calculation D AV (meV) D C a (meV) X A u (10" 2 nm) X ca (10- 2 nm) (AT) 

Corresponding parameters for DNA at 1.020 M NaCl (22) 33(2) 70(2) 3.3(8) 0.96(2) 

i Parameters calculated with fixed k and X, varying 35(5) 67(5) 3.2 

the initial D by 20% 

ii First round of minimization using Equation (7) 38(10) 71(10) 3(1) 1.1(4) 1.9 

iii Final parameters after second round of minimization 39(3) 67(4) 3(1) 0.84(3) 1.7 

using Equations (5) and (6) 

Shown are the parameters of the three rounds of minimizations (i)-(iii) and their DNA counterparts. 
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Figure 1. Average opening, sequences r(GAGUACUC) (blue boxes, 
sequence I of (26)), d(GAGTACTC) (red bullets) and d(GAGUA 
CUC) (green triangles, sequence III of (26)). Shown are the average 
{y) calculated at (a) 180K and (b) 200 K, and (c) the NN elastic 
constant k. 
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Figure 2. Average opening, sequences r(GCGAGUACUCGC) (blue 
boxes, sequence II of (26)), d(GCGAGTACTCGC) (red bullets) and 
d(GCGAGUACUCGC) (green triangles, sequence IV of (26)). Shown 
are the average (y) calculated at (a) 200 K and (b) 220 K, and (c) the 
NN elastic constant k. 



the linear regression in the 'Materials and Methods' 
section and that for very short sequences one has to set 
the temperature at unrealistically low values. Nevertheless, 
these base openings provide a qualitative picture for a 
comparison of the base pair oscillations between DNA 
and RNA. The amplitudes for RNA were found to be 
generally larger, especially in Figure 2. We conclude 
from these results that despite the stronger r(AU) 
hydrogen bond, the elastic constants are such that we 
obtain generally larger base pair oscillation (y). In other 
words, the larger oscillations are entirely due to the 
stacking interaction. Also, we do not observe important 
qualitative variations of the base pair openings in the 
central parts of the sequence (Figures la and 2a) despite 
important variations in NN flexibility in Figures lb and 
2b. This is consistent with comparative studies by 
Steinert et al. (36) who concluded from NMR measure- 
ments that base pair openings are similar in DNA and 



RNA. One should note that comparison of the base pair 
openings to results from molecular dynamics is not 
straightforward since the PB model which is used in this 
work is geometrically 2D, lacking especially the twisting 
angle. The closest parameter from molecular dynamics 
which relates to the openings would be the stretching of 
the base pairs combined to the rise of the NNs (see (37) for 
the definition of slide and rise). 

The Morse potentials for the r(CG) hydrogen bonds are 
similar to the d(CG) base pairs within the calculated 
uncertainties (Table 1). The D cc for RNA were allowed 
to vary freely during all minimization rounds, and the fact 
that it resulted in values similar to the DNA Morse po- 
tential provides some confidence that the model is consist- 
ent for all these datasets. 

The Morse potential width parameters X show little 
change with either minimization calculations (ii) and 
(iii) (Table 1). This is similar to what we observed in our 
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previous results for DNA (22,38), including the large 
uncertainty for r(AU) which is similar to the one found 
for d(AT). 

With the results for the hydrogen bonds of r(AU), we 
are now in a position to explore for example the effects of 
uracil misincorporation in DNA. This is an interesting 
possibility which underlines the largely unexplored 
capabilities of the PB model. We simply replace the 
Morse potential of d(AT) base pairs with the stronger 
d(AU) potential in DNA. This mimics the occurrence of 
uracil in DNA which results from uracil misincorporation 
and is an important epigenetic factor (39). Figures 1 and 2 
show a much smaller amplitude of (y) for sequences with 
deoxyuracil (green triangles), when compared with 
canonical DNA. This shows that uracil misincorporation 
appears to have important effect on the DNA base pair 
oscillation amplitude. Such a finding may be used in bio- 
informatics applications to search for mutagenic hotspots 
or to understand the uracil repair mechanism (39). 

Elastic constants 

Table 2 shows the calculated elastic constants for RNA 
compared with their DNA counterparts. Out of 10, only 3 
elastic constants are similar in RNA and DNA. For the 
remaining seven, there is no clear pattern. For instance, 
the elastic constant of r(CpG) is almost half as large as 
d(CpG), while r(ApU) is twice as large as d(ApT). This 
supports the current view that there is very little similarity 
between the flexibility of RNA and DNA in general and 
that, depending on the actual sequence, RNA may even be 
less rigid than DNA (6). This finding is summarized by the 
calculated rank correlation (40) between the elastic con- 
stants of RNA and DNA which is only 0.37. 

Figures lc and 2c illustrate the differences in elastic 
constants along two different DNA and RNA sequences. 
For the sequence of Figure lc, five out of seven RNA NNs 
do have elastic constants in the same range as DNA. In 
contrast, for the sequence of Figure 2c, only 5 out of 1 1 
NNs have elastic constants in the same range as for DNA. 
The two highly flexible positions at the end correspond to 
CpG NNs, which do have a much smaller elastic constant 
than that for DNA (Table 2). These localized differences 
in elastic constants shown in Figures lc and 2c do not 
result in equally localized differences of the average base 
pair opening (y) (corresponding Figures la and b and 2a 
and b). The differences do exist but their effect is seen over 
several base pair position. For instance, the very soft 
r(CpG) positions in Figure 2c result in very large base 



pair opening towards the extremes of the RNA molecule 
as shown in Figure 2a and b. 

Applicability and limitations 

The applicability of results presented in this work covers 
their use in the framework of the PB model and for the 
prediction of melting temperatures. For predicting melting 
temperatures, the regression parameters were calculated 
for short RNA sequences in the range of 4-14 bp in 
length. We expect that predictions would be reliable 
within this range. For longer sequences however, which 
may melt in more than one step, the prediction quality 
is likely to suffer. 

Unlike the melting temperature prediction, the PB 
model itself can in principle be applied to sequences of 
any length. It is well established that the model captures 
the dynamics of multistep melting quite well, especially if 
the more sophisticated variants of the model are used such 
as the anharmonic model (27). The parameters presented 
in this work may be used in conjunction with those more 
elaborate models. For instance, if we use the present PB 
parameters together with the following anharmonic par- 
ameters p = 2.0 and a = 0.35 (30) and recalculate the re- 
gression, we obtain an average melting temperature 
deviation (AT) = 1.8°C which is slightly larger than for 
the PB model. 

The results presented in this work were derived from the 
experimental data with large experimental uncertainties 
and from diverse experimental sources (23). Therefore, 
the larger uncertainties reported here for RNA are to be 
expected when compared with more recent experimental 
data for DNA (41). Nevertheless, the convergence of the 
CG hydrogen bonds for RNA and DNA is an indication 
that the number of sequences reported by Xia et al. (23) is 
sufficient for an acceptable estimate of the RNA 
parameters. 

CONCLUSIONS 

We obtained the parameters of canonical base pairs for 
RNA within the framework of the PB model from experi- 
mental melting temperature calculations. The prediction 
of melting temperatures is better than for the NN model 
with the additional benefit that non-two-state sequences 
need not to be taken out of the regression analysis. The 
resulting Morse potentials indicate a stronger r(AU) 
hydrogen bond which is consistent with recent NMR 
measurements. The calculated flexibility parameters of 



Table 2. Elastic constants k for RNA and their corresponding counterparts 


in DNA 








XpY 


(k) (RNA) (k) (DNA) 


XpY 


(k) (RNA) 




(k) (DNA) 


CpG 


1.3(1) 2.5(2) 


ApA = UpU 


1.9(1) 




2.5(2) 


ApU 


2.2(2) 1.2(4) 


CpA = UpG 


2.43(9) 




3.1(2) 


ApG = CpU 


2.5(1) 4* 2.3(3) 


CpC = GpG 


2.8(2) 




2.1(1) 


UpA 


3.0(3) 3.4(6) 


ApC = GpU 


3.0(1) 




2.4(2) 


GpC 


3.1(2) 3.7(3) 


GpA = UpC 


3.3(1) 




3.1(2) 



Results are for the last minimization round, calculation (iii), and are given in eV/nm 2 . The arrows highlight the elastic constants of DNA and RNA 
which are similar within the calculated uncertainty. The results for DNA are from (22) for a salt concentration of 1.020 M NaCl. 
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RNA were found to have little in common with their 
DNA counterparts as was previously found from molecu- 
lar dynamics simulations. We expect that this newly 
obtained set of parameters opens many new possibilities 
for applying the PB model to RNA on the same level as 
happened for DNA. 

SUPPLEMENTARY DATA 

Supplementary Data are available at NAR Online: 
Supplementary Tables I-V. 
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